Skip to content

libnvme: internalize uring dispatch and fix NULL hdl dereference#3343

Open
martin-belanger wants to merge 3 commits intolinux-nvme:masterfrom
martin-belanger:fix-passthru-null-hdl
Open

libnvme: internalize uring dispatch and fix NULL hdl dereference#3343
martin-belanger wants to merge 3 commits intolinux-nvme:masterfrom
martin-belanger:fix-passthru-null-hdl

Conversation

@martin-belanger
Copy link
Copy Markdown

Problem

libnvme_ctrl_get_transport_handle() uses lazy-open semantics: it opens /dev/nvmeX on first use. When the device is mid-teardown (kernel returns EAGAIN), libnvme_open() fails and the handle stays NULL. Three callers passed that NULL directly into libnvme_submit_admin_passthru(), which dereferenced it immediately at the hdl->uring_enabled check — causing a segfault reproducible with nvme-stas under rapid device create/delete cycling.

Fix

Per your suggestion, the NULL guard is now centralized in ioctl.c rather than scattered across call sites.

While moving it there, the io_uring vs. ioctl routing was also pulled out of callers and into libnvme_submit_admin_passthru() itself. libnvme_wait_complete_passthru() is now a true no-op (return 0) in the no-uring build, so all callers can pair submit + wait unconditionally. The batching behaviour in libnvme_get_log() is preserved.

Question

@igaw - Should we also add if (!hdl) return -ENODEV; with a LIBNVME_LOG_DEBUG message at the three call sites that obtain hdl from libnvme_ctrl_get_transport_handle() (nvme_discovery_log, nvmf_dim, libnvme_ctrl_identify)? This would provide a traceable log entry identifying the exact code path that hit the NULL, which is useful for diagnosing the race. Happy to add that as a follow-up commit if you think it's worthwhile.

Testing

Verified with nvme-stas tests that rapidly create and delete NVMe devices — no crash after this patch.

@martin-belanger
Copy link
Copy Markdown
Author

This PR replaces: #3341

Comment thread libnvme/src/nvme/nvme-cmds.c
@igaw
Copy link
Copy Markdown
Collaborator

igaw commented May 8, 2026

And I forgot to mention we also would need this for libnvme_submit_io_passthru too to make it really work.

Martin Belanger added 3 commits May 8, 2026 14:59
Move the io_uring vs. ioctl routing decision out of callers and into
libnvme_submit_admin_passthru() itself.

Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
Every caller of libnvme_submit_admin_passthru() must follow up with
libnvme_wait_complete_passthru() to drain io_uring completions before
inspecting response data.

Functions with two sequential submits where the second depends on the
first's response (libnvme_set_etdas, libnvme_clear_etdas,
libnvme_get_uuid_list) get an intermediate wait between the two submits.

libnvme_get_log() already had a conditional wait guarded by
hdl->uring_enabled; make it unconditional now that
libnvme_wait_complete_passthru() is a true no-op in the no-uring build.

Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
libnvme_ctrl_get_transport_handle() can return NULL when libnvme_open()
fails during rapid device teardown. Add a NULL check at the top of both
libnvme_submit_admin_passthru() and libnvme_submit_io_passthru() so the
condition is caught centrally rather than causing a NULL dereference in
callers.

Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
@martin-belanger martin-belanger force-pushed the fix-passthru-null-hdl branch from 2fee4a7 to 16dc0db Compare May 8, 2026 19:37
@martin-belanger
Copy link
Copy Markdown
Author

@igaw, I was thinking of introducing a helper function, libnvme_admin_passthru() (name is up for discussion), to simplify blocks of code like the one below, which appear in several places in the code.

	err = libnvme_submit_admin_passthru(hdl, &cmd);
	if (err)
		return err;
	err = libnvme_wait_complete_passthru(hdl);
	if (err)
		return err;

This could be simplified as:

	err = libnvme_admin_passthru(hdl, &cmd);
	if (err)
		return err;

Here's what the helper would look like:

  static inline int libnvme_admin_passthru(struct libnvme_transport_handle *hdl,
          struct libnvme_passthru_cmd *cmd)
  {
      int err = libnvme_submit_admin_passthru(hdl, cmd);
      return err ? err : libnvme_wait_complete_passthru(hdl);
  }

I also noticed that libnvme_wait_complete_passthru() is currently a private function while libnvme_submit_admin_passthru() is public. Since they act as a pair, I think that both should be public, and the helper should be public as well in ioctl.h.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants